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Foreword 



rd , 



This Technical Specification (TS) has been produced by the ETSI 3 Generation Partnership Project (3GPP). 

The present document may refer to technical specifications or reports using their 3 GPP identities or GSM identities. 
These should be interpreted as being references to the corresponding ETSI deliverables. The mapping of document 
identities is as follows: 

For 3 GPP documents: 

3G TS I TR nn.nnn "<title>" (with or without the prefix 3G) 

is equivalent to 

ETSI TS I TR Inn nnn "[Digital cellular telecommunications system (Phase 2+) (GSM);] Universal Mobile 
Telecommunications System; <title> 

For GSM document identities of type "GSM xx.yy", e.g. GSM 01.04, the corresponding ETSI document identity may be 
found in the Cross Reference List on www.etsi.org/key 
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Foreword 

This Technical Specification has been produced by the 3 GPP. 

The present document defines the detailed requirements for the correct operation of the background acoustic noise 
evaluation, noise parameter encoding/decoding and comfort noise generation in the narrowband telephony speech 
service employing the Adaptive Multi-Rate (AMR) speech coder within the 3 GPP system. 

The contents of the present document are subject to continuing work within the TSG and may change following formal 
TSG approval. Should the TSG modify the contents of this TS, it will be re-released by the TSG with an identifying 
change of release date and an increase in version number as follows: 

Version 3.y.z 

where: 

X the first digit: 

1 presented to TSG for information; 

2 presented to TSG for approval; 

3 Indicates TSG approved document under change control. 

y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, 
updates, etc. 

z the third digit is incremented when editorial only changes have been incorporated in the specification; 
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1 Scope 



This document gives the detailed requirements for the correct operation of the background acoustic noise evaluation, 
noise parameter encoding/decoding and comfort noise generation for the AMR speech codec during Source Controlled 
Rate (SCR) operation. 



The requirements described in this document are mandatory for implementation in all UEs capable of supporting the 
AMR speech codec. 



The receiver requirements are mandatory for implementation in all networks capable of supporting the AMR speech 
codec, the transmitter requirements only for those where downlink SCR will be used. 



In case of discrepancy between the requirements described in this document and the fixed point computational 
description of these requirements contained in [1], the description in [1] will prevail. 



Normative references 



This document incorporates by dated and undated reference, provisions from other publications. These normative 
references are cited at the appropriate places in the text and the publications are listed hereafter. For dated references, 
subsequent amendments to or revisions of any of these publications apply to this document only when incorporated in it 
by amendment or revision. For undated references, the latest edition of the publication referred to applies. 

[1] 3G TS 26.073 : "AMR Speech Codec; ANSI-C code". 

[2] 3G TS 26.090 : "AMR Speech Codec; Transcoding functions". 

[3] 3G TS 26.091 : "AMR Speech Codec; Error concealment of lost frames ". 

[4] 3G TS 26.093 : "AMR Speech Codec; Source Controlled Rate operation ". 

[5] 3G TS 26.101 : "AMR Speech Codec; Frame Structure". 



ETSI 



(3G TS 26.092 version 3.0.1 Release 1999) 



ETSI TS 126 092 V3.0.1 (2000-01) 



3.1 



Definitions, symbols and abbreviations 



Definitions 



For the purpose of this document, the following definitions apply. 

Frame: Time interval of 20 ms corresponding to the time segmentation of the adaptive multi-rate speech transcoder, 
also used as a short term for traffic frame. 

SID frames: Special Comfort Noise frames. It may convey information on the acoustic background noise or inform the 
decoder that it should start generating background noise. 

Speech frame: Traffic frame that cannot be classified as a SID frame. 

VAD flag: Voice Activity Detection flag. 

TX_TYPE: one of SPEECH, SID_FIRST, SID_UPD, NO_DATA (defined in [4]). 

RX_TYPE: Classification of the received traffic frame (defined in [4]). 

Other definitions of terms used in this document can be found in [2] and [4] . The overall operation of SCR is described 
in [4]. 



3.2 Symbols 

For the purpose of this document , the following symbols apply. Boldface symbols are used for vector variables. 
f = [/i /2 . . . /lo J Unquantized LSF vector 

f —\f1f2'" /lo Quantized LSF vector 

Unquantized LSF vector of frame m 
Quantized LSF vector of frame m 
Averaged LSF parameter vector 
Logarithmic frame energy 



o(m) 
^mea 

en 



log 



enZr 



^ref 

e 
e 



Averaged logarithmic frame energy 

Reference vector for LSF quantization 
Computed LSF parameter prediction residual 

Quantized LSF parameter prediction residual 



V x(n) = x{a) + x{a + l) + ... + x{b-l) + x{b) 



3.3 



Abbreviations 



For the purpose of this document , the following abbreviations apply. 
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AMR Adaptive Multi-Rate 

SCR Source Controlled Rate operation ( aka source discontinuous transmission ) 

UE User Equipment 

SID Silence Descriptor 

LP Linear Prediction 

LSP Line Spectral Pair 

LSF Line Spectral Frequency 

RX Receive 

TX Transmit 

VAD Voice Activity Detector 



4 General 

A basic problem when using SCR is that the background acoustic noise, which is transmitted together with the speech, 
would disappear when the transmission is cut, resulting in discontinuities of the background noise. Since the SCR 
switching can take place rapidly, it has been found that this effect can be very annoying for the listener - especially in a 
car environment with high background noise levels. In bad cases, the speech may be hardly intelligible. 



This document specifies the way to overcome this problem by generating on the receive (RX) side synthetic noise 
similar to the transmit (TX) side background noise. The comfort noise parameters are estimated on the TX side and 
transmitted to the RX side at a regular rate when speech is not present. This allows the comfort noise to adapt to the 
changes of the noise on the TX side. 



5 Functions on the transmit (TX) side 

The comfort noise evaluation algorithm uses the following parameters of the AMR speech encoder, defined in [2] : 

- the unquantized Linear Prediction (LP) parameters, using the Line Spectral Pair (LSP) representation, where the 
unquantized Line Spectral Frequency (LSF) vector is given by f = [/i /2 • • • /lo J ' 

- the unquantized LSF vector for the 12.2 kbit/s mode is given by the second set of LSF parameters in the frame. 
The algorithm computes the following parameters to assist in comfort noise generation: 

- the averaged LSF parameter vector f ^^^^ (average of the LSF parameters of the eight most recent frames); 

- the averaged logarithmic frame energy en^^^ (average of the logarithmic energy of the eight most recent 
frames). 

These parameters give information on the level ( ^^log ) ^^^ ^^ spectrum ( f ^^^^ ) of the background noise. 

The evaluated comfort noise parameters ( f ^^^^ and en^^^ ) are encoded into a special frame, called a Silence 
Descriptor (SID) frame for transmission to the RX side. 

A hangover logic is used to enhance the quality of the silence descriptor frames. A hangover of seven frames is added to 
the VAD flag so that the coder waits with the switch from active to inactive mode for a period of seven frames, during 
that time the decoder can compute a silence descriptor frame from the quantized LSFs and the logarithmic frame energy 
of the decoded speech signal. Therefore, no comfort noise description is transmitted in the first SID frame after active 
speech. If the background noise contains transients which will cause the coder to switch to active mode and then back to 
inactive mode in a very short timeperiod, no hangover is used. Instead the previously used comfort noise frames are 
used for comfort noise generation. 



ETS\ 



(3G TS 26.092 version 3.0.1 Release 1999) 8 ETSI TS 126 092 V3.0.1 (2000-01) 



The first SID frame also serves to initiate the comfort noise generation on the receive side, as a first SID frame is 
always sent at the end of a speech burst, i.e., before the transmission is terminated. 

The scheduling of SID or speech frames on the network path is described in [4]. 

5.1 LSF evaluation 

The comfort noise parameters to be encoded into a SID frame are calculated over N = 8 consecutive frames marked 
with VAD=0, as follows: 

The averaged LSF parameter vector f ^^^^ (/) of the frame / shall be computed according to the equation: 

r'''Hi) = ^t.f(i-n) (1) 

where f (/ — n) is the (unquantized) LSF parameter vector of the current frame i (n = 0) and past frames 
(/2 = 1,...,7). 

The averaged LSF parameter vector f^^^" ^jj of the frame / is encoded using the same encoding tables that are also 

used by the 7.4 kbit/s mode for the encoding of the non-averaged LSF parameter vectors in ordinary speech encoding 
mode, but the quantization algorithm is modified in order to support the quantization of comfort noise. 



The LSF parameter prediction residual to be quantized for frame / is obtained according to the following equation: 

(2) 



e(/) = f"^"'^(/)-f"^ 



where f is a reference vector picked from a codebook. 

The vector f used in eq (2) is encoded for each SID frame. A lookup table containing 8 vectors typical for 
background noise are searched. The vector which yields the lowest prediction residual energy is selected. After the 
above step the LSF parameter encoding procedure is performed. The 3 -bit index for the reference vector and the 26 bits 
for LSF parameter are transmitted in the SID frame (see bit allocation in table 1). 

5.2 Frame energy calculation 

The frame energy is computed for each frame marked with VAD=0 according to the equation : 

/" ^ N-l \ 



1 (1 



(3) 



V ^^ n=0 

where syn) is the HP-filtered input speech signal of the current frame /. 
The averaged logarithmic energy is computed by: 

n=0 

The averaged logarithmic energy is quantized means of a 6 bit algorithmic quantizer. The 6 bits for the energy index are 
transmitted in the SID frame (see bit allocation in table 1). 
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5.3 Modification of tine speecin encoding algoritinnn during SID 
frame generation 

When the TX_TYPE is not equal to SPEECH the speech encoding algorithm is modified in the following way: 

- The non-averaged LP parameters which are used to derive the filter coefficients of the filters Hyz) and W(z) 
of the speech encoder are not quantized; 

- The open loop pitch lag search is performed, but the closed loop pitch lag search is inactivated. The adaptive 
codebook gain and memory is set to zero. 

- No fixed codebook search is made. 

- The memory of weighting filter W(z) is set to zero, i.e., the memory of W(z) is not updated. 

- The ordinary LP parameter quantization algorithm is inactive. The averaged LSF parameter vector f"^^^" is 
calculated each time a new SID frame is to be sent to the AN. This parameter vector is encoded into the SID 
frame as defined in subclause 5.L 

- The ordinary gain quantization algorithm is inactive. 

- The predictor memories of the ordinary LP parameter quantization and fixed codebook gain quantization 
algorithms are initialized when TX_TYPE is not SPEECH, so that the quantizers start from known initial states 
when the speech activity begins again. 



5.4 SID-frame encoding 



The encoding of the comfort noise bits in a SID frame is described in [5] where the indication of the first SID frame is 
also described. The bit allocation and sequence of the bits from comfort noise encoding is shown in Table 1 . 



Functions on the receive (RX) side 



The situations in which comfort noise shall be generated on the receive side are defined in [4]. In general, the comfort 
noise generation is started or updated whenever a valid SID frame is received. 

6.1 Averaging and decoding of tine LP and energy parameters 

When speech frames are received by the decoder the LP and the energy parameters of the last seven speech frames shall 
be kept in memory. The decoder counts the number of frames elapsed since the last SID frame was updated and passed 
to the RSS by the encoder. Based on this count, the decoder determines whether or not there is a hangover period at the 
end of the speech burst (defined in[4] ). The interpolation factor is also adapted to the SID update rate. 

As soon as a SID frame is received comfort noise is generated at the decoder end. The first SID frame parameters are 
not received but computed from the parameters stored during the hangover period. If no hangover period is detected, the 
parameters from the previous SID update are used. 

The averaging procedure for obtaining the comfort noise parameters for the first SID frame is as follows: 

when a speech frame is received, the LSF vector is decoded and stored in memory, moreover the logarithmic 
frame energy of the decoded signal is also stored in memory. 

- the averaged values of the quantized LSF vectors and the averaged logarithmic frame energy of the decoded 
frames are computed and used for comfort noise generation. 

The averaged value of the LSF vector for the first SID frame is given by: 
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f...„(.)^l^|(._„) (5) 

where f (/ — ^) , ^ > is the quantized LSF vector of one of the frames of the hangover period and where f (/ — 0) = 
fyi — l) . The averaged logarithmic frame energy for the first SID frame is given by: 

^<r(0=^L^XgO-'^) (6) 

where ^^Iqs v^ ~ ^/ , ^ > is the logaritmic vector of one of the frames of the hangover period computed for the 
decoded frames and where ^^log V ~ 0) = ^^log V ~ !)• 

For ordinary SID frames, the LSF vector and logarithmic frame energy are computed by table lookup. The energy is 
also adjusted according to the signalled speech modes capabilities, as to provide high quality transitions from Comfort 
Noise to Speech. The LSF vector is given by the sum of the decoded reference vector and the decoded LSF prediction 
residual. 

During comfort noise generation the spectrum and energy of the comfort noise is determined by interpolation between 
old and new SID frames. 

In order to achieve a comfort noise that is less static in appearance the LSF vector is slightly perturbed for each frame 
by adding a small component based on parameters variations computed in the hangover period. The computation of the 

perturbation is made by computing the mean LSF vector from the matrix f , this mean vector is then subtracted from 

each of the elements of f forming a new matrix f . For every frame a mean removed LSF vector is randomly choosen 

from f and added to the interpolated LSF vector. 

6. 2 Comfort noise generation and updating 

The comfort noise generation procedure uses the adaptive multi-rate speech decoder algorithm defined in [2] . 

When comfort noise is to be generated, the various encoded parameters are set as follows: 

In each subframe, the pulse positions and signs of the fixed codebook excitation are locally generated using uniformly 
distributed pseudo random numbers. The excitation pulses take values of +1 and -1 when comfort noise is generated. 
The fixed codebook comfort noise excitation generation algorithm works as follows: 



for (i = 0; i < 40; i++) code[i] = 0; 
for (i = 0; i < 10; i++) { 

j = random(4); 

idx=j * 10 + i; 

if (random(2) ==1) code[idx] = 1; 

else code[idx] = -1; 

} 

where: 

code [ . .39] fixed codebook excitation buffer; 

random (4 ) generates a random integer value, uniformly distributed between and 3; 
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random (2 ) generates a random integer value, uniformly distributed between and 1. 

The fixed codebook gain is computed from the logarithmic frame energy parameter by converting it to the linear 
domain and normalizing with the gain of LP synthesis filter. 

The adaptive codebook gain values in each subframe are set to 0, also the memory of the adaptive codebook is set to 
zero. 

The pitch delay values in each subframe are set to 40. 

The LP filter parameters used are those received in the SID frame. 

The predictor memories of the ordinary LP parameter and fixed codebook gain quantization algorithms are initialized 
when RX_TYPE is not SPEECH , so that the quantizers start from given initial states when the speech activity begins 
again. With these parameters, the speech decoder now performs the standard operations described in [2] and synthesizes 
comfort noise. 

Updating of the comfort noise parameters (energy and LP filter parameters) occurs each time a valid SID frame is 
received, as described in [4] . 

When updating the comfort noise, the parameters above should be interpolated over the SID update period to obtain 
smooth transitions. 



7 Computational details and bit allocation 

A bit exact computational description of comfort noise encoding and generation in form of an ANSI-C source code is 
found in [1]. 

The detailed bit allocation and the sequence of bits in the comfort noise encoding is shown in Table 1 . 

Table 1 : Source encoder output parameters in order of occurrence and bit allocation for comfort 

noise encoding. 



Bits (MSB-LSB) 


Description 


s1 -s3 


index of reference vector 


s4-s11 


index of 1st LSF subvector 


S12-S20 


index of 2nd LSF subvector 


s21 -s29 


index of 3rd LSF subvector 


s30 - s35 


index of logarithmic frame energy 
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